Serveur d'exploration sur SGML

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

On the use of hierarchical information in sequential mining-based XML document similarity computation

Identifieur interne : 000959 ( Main/Exploration ); précédent : 000958; suivant : 000960

On the use of hierarchical information in sequential mining-based XML document similarity computation

Auteurs : Ho-Pong Leung [Hong Kong] ; Fu-Lai Chung [Hong Kong] ; Stephen Chi-Fai Chan [Hong Kong]

Source :

RBID : ISTEX:5EE2A64E8236341BE836AE61001457D8564C558C

English descriptors

Abstract

Abstract: Measuring the structural similarity among XML documents is the task of finding their semantic correspondence and is fundamental to many web-based applications. While there exist several methods to address the problem, the data mining approach seems to be a novel, interesting and promising one. It explores the idea of extracting paths from XML documents, encoding them as sequences and finding the maximal frequent sequences using the sequential pattern mining algorithms. In view of the deficiencies encountered by ignoring the hierarchical information in encoding the paths for mining, a new sequential pattern mining scheme for XML document similarity computation is proposed in this paper. It makes use of a preorder tree representation (PTR) to encode the XML tree’s paths so that both the semantics of the elements and the hierarchical structure of the document can be taken into account when computing the structural similarity among documents. In addition, it proposes a postprocessing step to reuse the mined patterns to estimate the similarity of unmatched elements so that another metric to qualify the similarity between XML documents can be introduced. Encouraging experimental results were obtained and reported.

Url:
DOI: 10.1007/s10115-004-0156-7


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">On the use of hierarchical information in sequential mining-based XML document similarity computation</title>
<author>
<name sortKey="Leung, Ho Pong" sort="Leung, Ho Pong" uniqKey="Leung H" first="Ho-Pong" last="Leung">Ho-Pong Leung</name>
</author>
<author>
<name sortKey="Chung, Fu Lai" sort="Chung, Fu Lai" uniqKey="Chung F" first="Fu-Lai" last="Chung">Fu-Lai Chung</name>
</author>
<author>
<name sortKey="Chan, Stephen Chi Fai" sort="Chan, Stephen Chi Fai" uniqKey="Chan S" first="Stephen Chi-Fai" last="Chan">Stephen Chi-Fai Chan</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:5EE2A64E8236341BE836AE61001457D8564C558C</idno>
<date when="2004" year="2004">2004</date>
<idno type="doi">10.1007/s10115-004-0156-7</idno>
<idno type="url">https://api.istex.fr/ark:/67375/VQC-FBGPCSL5-D/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001910</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">001910</idno>
<idno type="wicri:Area/Istex/Curation">001384</idno>
<idno type="wicri:Area/Istex/Checkpoint">000888</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000888</idno>
<idno type="wicri:doubleKey">0219-1377:2004:Leung H:on:the:use</idno>
<idno type="wicri:Area/Main/Merge">000968</idno>
<idno type="wicri:Area/Main/Curation">000959</idno>
<idno type="wicri:Area/Main/Exploration">000959</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">On the use of hierarchical information in sequential mining-based XML document similarity computation</title>
<author>
<name sortKey="Leung, Ho Pong" sort="Leung, Ho Pong" uniqKey="Leung H" first="Ho-Pong" last="Leung">Ho-Pong Leung</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>Department of Computing, Hong Kong Polytechnic University, Hunghom, Kowloon</wicri:regionArea>
<wicri:noRegion>Kowloon</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Chung, Fu Lai" sort="Chung, Fu Lai" uniqKey="Chung F" first="Fu-Lai" last="Chung">Fu-Lai Chung</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>Department of Computing, Hong Kong Polytechnic University, Hunghom, Kowloon</wicri:regionArea>
<wicri:noRegion>Kowloon</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Hong Kong</country>
</affiliation>
</author>
<author>
<name sortKey="Chan, Stephen Chi Fai" sort="Chan, Stephen Chi Fai" uniqKey="Chan S" first="Stephen Chi-Fai" last="Chan">Stephen Chi-Fai Chan</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>Department of Computing, Hong Kong Polytechnic University, Hunghom, Kowloon</wicri:regionArea>
<wicri:noRegion>Kowloon</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Knowledge and Information Systems</title>
<title level="j" type="sub">An International Journal</title>
<title level="j" type="abbrev">Knowl Inf Syst</title>
<idno type="ISSN">0219-1377</idno>
<idno type="eISSN">0219-3116</idno>
<imprint>
<publisher>Springer-Verlag; www.springeronline.com</publisher>
<pubPlace>London</pubPlace>
<date type="published" when="2005-05-01">2005-05-01</date>
<biblScope unit="volume">7</biblScope>
<biblScope unit="issue">4</biblScope>
<biblScope unit="page" from="476">476</biblScope>
<biblScope unit="page" to="498">498</biblScope>
</imprint>
<idno type="ISSN">0219-1377</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0219-1377</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Information retrieval</term>
<term>Sequential mining</term>
<term>XML structural similarity</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: Measuring the structural similarity among XML documents is the task of finding their semantic correspondence and is fundamental to many web-based applications. While there exist several methods to address the problem, the data mining approach seems to be a novel, interesting and promising one. It explores the idea of extracting paths from XML documents, encoding them as sequences and finding the maximal frequent sequences using the sequential pattern mining algorithms. In view of the deficiencies encountered by ignoring the hierarchical information in encoding the paths for mining, a new sequential pattern mining scheme for XML document similarity computation is proposed in this paper. It makes use of a preorder tree representation (PTR) to encode the XML tree’s paths so that both the semantics of the elements and the hierarchical structure of the document can be taken into account when computing the structural similarity among documents. In addition, it proposes a postprocessing step to reuse the mined patterns to estimate the similarity of unmatched elements so that another metric to qualify the similarity between XML documents can be introduced. Encouraging experimental results were obtained and reported.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Hong Kong</li>
</country>
</list>
<tree>
<country name="Hong Kong">
<noRegion>
<name sortKey="Leung, Ho Pong" sort="Leung, Ho Pong" uniqKey="Leung H" first="Ho-Pong" last="Leung">Ho-Pong Leung</name>
</noRegion>
<name sortKey="Chan, Stephen Chi Fai" sort="Chan, Stephen Chi Fai" uniqKey="Chan S" first="Stephen Chi-Fai" last="Chan">Stephen Chi-Fai Chan</name>
<name sortKey="Chung, Fu Lai" sort="Chung, Fu Lai" uniqKey="Chung F" first="Fu-Lai" last="Chung">Fu-Lai Chung</name>
<name sortKey="Chung, Fu Lai" sort="Chung, Fu Lai" uniqKey="Chung F" first="Fu-Lai" last="Chung">Fu-Lai Chung</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Informatique/explor/SgmlV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000959 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000959 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Informatique
   |area=    SgmlV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:5EE2A64E8236341BE836AE61001457D8564C558C
   |texte=   On the use of hierarchical information in sequential mining-based XML document similarity computation
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jul 1 14:26:08 2019. Site generation: Wed Apr 28 21:40:44 2021